Goto

Collaborating Authors

 mean 0



Supplementary Material for Representation Learning for Optimal Individualized Treatments with Multivariate Outcomes

Neural Information Processing Systems

In this supplementary material, we describe in details the simulation procedures including all parameters, additional model fitting details, and additional simulation results for section 4.1 in the main In this section, we describe the data generating mechanism in section 4.1 of the main paper. In order to learn the three latent domains in the correct directions, we control the direction of the estimated parameters for one item per latent domain. Table A.1: Simulation parameters for the conditional distributions of observed items Table B.1: Accuracy of the fitted optimal treatment on the test set from 100 simulations for training sample size of 200, 500, 1000, and 2000


Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

Xu, Lihan, Dong, Yanjie, Wang, Gang, Zeng, Runhao, Fan, Xiaoyi, Hu, Xiping

arXiv.org Artificial Intelligence

Abstract--We investigate robust federated learning, where a group of workers collaboratively train a shared model under the orchestration of a central server in the presence of Byzantine adversaries capable of arbitrary and potentially malicious behaviors. T o simultaneously enhance communication efficiency and robustness against such adversaries, we propose a Byzantine-resilient Nesterov-Accelerated Federated Learning (Byrd-NAFL) algorithm. Byrd-NAFL seamlessly integrates Nesterov's momentum into the federated learning process alongside Byzantine-resilient aggregation rules to achieve fast and safeguarding convergence against gradient corruption. We establish a finite-time convergence guarantee for Byrd-NAFL under non-convex and smooth loss functions with relaxed assumption on the aggregated gradients. Extensive numerical experiments validate the effectiveness of Byrd-NAFL and demonstrate the superiority over existing benchmarks in terms of convergence speed, accuracy, and resilience to diverse Byzantine attack strategies. As a promising paradigm for privacy-preserving distributed learning, federated learning (FL) leverages the parallel computational capabilities of user terminals to learn from decentralized data with the orchestration of a central server. Since its inception [1], [2], FL has been proliferating across diverse application scenarios, e.g., healthcare [3], [4], mobile edge [5], [6], and autonomous driving [7], [8]. Despite the merits in preserving user privacy, vanilla FL paradigm is still facing two major challenges, namely, Byzantine resilience [9], [10] and communication efficiency [11]. To robustify the FL paradigm, Byzantine-resilient aggregation rules, e.g., Krum [10], the component-wise median (CwMed) [15], Bulyan [16], and geometric median (GeoMed) [17], are designed to enhance the trustworthiness and reliability of the FL paradigm. Another major challenge in FL lies in enhancing communication efficiency. Current communication-efficient FL algorithms can be broadly classified into three categories: (i) communication frequency reduction [18], [19], [20], [21], [22], [12], (ii) exchanged information compression [23], [24], [25], [6], and (iii) iteration reduction [20], [26], [27], [28].


BiMax: Bidirectional MaxSim Score for Document-Level Alignment

Wang, Xiaotian, Utsuro, Takehito, Nagata, Masaaki

arXiv.org Artificial Intelligence

Document alignment is necessary for the hierarchical mining (Bañón et al., 2020; Morishita et al., 2022), which aligns documents across source and target languages within the same web domain. Several high precision sentence embedding-based methods have been developed, such as TK-PERT (Thompson and Koehn, 2020) and Optimal Transport (OT) (Clark et al., 2019; El-Kishky and Guzmán, 2020). However, given the massive scale of web mining data, both accuracy and speed must be considered. In this paper, we propose a cross-lingual Bidirectional Maxsim score (BiMax) for computing doc-to-doc similarity, to improve efficiency compared to the OT method. Consequently, on the WMT16 bilingual document alignment task, BiMax attains accuracy comparable to OT with an approximate 100-fold speed increase. Meanwhile, we also conduct a comprehensive analysis to investigate the performance of current state-of-the-art multilingual sentence embedding models. All the alignment methods in this paper are publicly available as a tool called EmbDA (https://github.com/EternalEdenn/EmbDA).



Deep Survival Analysis for Competing Risk Modeling with Functional Covariates and Missing Data Imputation

Gao, Penglei, Zou, Yan, Duggal, Abhijit, Huang, Shuaiqi, Liang, Faming, Wang, Xiaofeng

arXiv.org Artificial Intelligence

We introduce the Functional Competing Risk Net (FCRN), a unified deep-learning framework for discrete-time survival analysis under competing risks, which seamlessly integrates functional covariates and handles missing data within an end-to-end model. By combining a micro-network Basis Layer for functional data representation with a gradient-based imputation module, FCRN simultaneously learns to impute missing values and predict event-specific hazards. Evaluated on multiple simulated datasets and a real-world ICU case study using the MIMIC-IV and Cleveland Clinic datasets, FCRN demonstrates substantial improvements in prediction accuracy over random survival forests and traditional competing risks models. This approach advances prognostic modeling in critical care by more effectively capturing dynamic risk factors and static predictors while accommodating irregular and incomplete data.



Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions

Erturk, Eray, Kamran, Fahad, Abbaspourazad, Salar, Jewell, Sean, Sharma, Harsh, Li, Yujie, Williamson, Sinead, Foti, Nicholas J, Futoma, Joseph

arXiv.org Artificial Intelligence

Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable data from 162K individuals, systematically optimizing architectures and tokenization strategies for this unique dataset. Evaluated on 57 health-related tasks, our model shows strong performance across diverse real-world applications including individual-level classification and time-varying health state prediction. The model excels in behavior-driven tasks like sleep prediction, and improves further when combined with representations of raw sensor data. These results underscore the importance of tailoring foundation model design to wearables and demonstrate the potential to enable new health applications.


TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

Ekambaram, Vijay, Kumar, Subodh, Jati, Arindam, Mukherjee, Sumanta, Sakai, Tomoya, Dayama, Pankaj, Gifford, Wesley M., Kalagnanam, Jayant

arXiv.org Artificial Intelligence

The rise of time-series pre-trained models has advanced temporal representation learning, but current state-of-the-art models are often large-scale, requiring substantial compute. We introduce TSPulse, ultra-compact time-series pre-trained models with only 1M parameters, specialized to perform strongly across classification, anomaly detection, imputation, and retrieval tasks. TSPulse introduces innovations at both the architecture and task levels. At the architecture level, it employs a dual-space masked reconstruction, learning from both time and frequency domains to capture complementary signals. This is further enhanced by a dual-embedding disentanglement, generating both detailed embeddings for fine-grained analysis and high-level semantic embeddings for broader task understanding. Notably, TSPulse's semantic embeddings are robust to shifts in time, magnitude, and noise, which is important for robust retrieval. At the task level, TSPulse incorporates TSLens, a fine-tuning component enabling task-specific feature attention. It also introduces a multi-head triangulation technique that correlates deviations from multiple prediction heads, enhancing anomaly detection by fusing complementary model outputs. Additionally, a hybrid mask pretraining is proposed to improves zero-shot imputation by reducing pre-training bias. These architecture and task innovations collectively contribute to TSPulse's significant performance gains: 5-16% on the UEA classification benchmarks, +20% on the TSB-AD anomaly detection leaderboard, +50% in zero-shot imputation, and +25% in time-series retrieval. Remarkably, these results are achieved with just 1M parameters (10-100X smaller than existing SOTA models) and allow GPU-free inference, setting a new standard for efficient time-series pre-trained models. The models can be accessed from https://huggingface.co/ibm-granite/granite-timeseries-tspulse-r1


Enhancing BERTopic with Intermediate Layer Representations

Koterwa, Dominik, Świtała, Maciej

arXiv.org Artificial Intelligence

BERTopic is a topic modeling algorithm that leverages transformer-based embeddings to create dense clusters, enabling the estimation of topic structures and the extraction of valuable insights from a corpus of documents. This approach allows users to efficiently process large-scale text data and gain meaningful insights into its structure. While BERTopic is a powerful tool, embedding preparation can vary, including extracting representations from intermediate model layers and applying transformations to these embeddings. In this study, we evaluate 18 different embedding representations and present findings based on experiments conducted on three diverse datasets. To assess the algorithm's performance, we report topic coherence and topic diversity metrics across all experiments. Our results demonstrate that, for each dataset, it is possible to find an embedding configuration that performs better than the default setting of BERTopic. Additionally, we investigate the influence of stop words on different embedding configurations.